pacman::p_load(tmap, sf, DT, stplanr, tidyverse)Hands-on Exercise 10A
Processing & Visualizing Flow Data
1.0 Overview
In this exercise, we will build an OD matrix by using Passenger Volume by Origin Destination Bus Stops data set downloaded from LTA Data Mall.
2.0 Getting Started
For this exercise, we will use five r packages:
sf, fo importing, integrating, processing and transforming geospatial data
tidyverse, for importing, integrating, wrangling and visualizing data
tmap, for creating elegant thematic maps
stplanr, for solving common problems in transport planning and modelling
DT, which provides an R interface to the DataTables javascript library
3.0 Preparing the Flow Data
3.1 Importing the OD data
First, we will import the Passenger Volume by Origin Destination Bus Stops data set:
#|eval: false
odbus <- read_csv("data/aspatial/origin_destination_bus_202210.csv")Rows: 5122925 Columns: 7
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (3): YEAR_MONTH, DAY_TYPE, PT_TYPE
dbl (4): TIME_PER_HOUR, ORIGIN_PT_CODE, DESTINATION_PT_CODE, TOTAL_TRIPS
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
write_rds(odbus, "data/rds/odbus.rds")To read in the saved rds file:
odbus <- read_rds("data/rds/odbus.rds")We can display the odbus tibble data table by using the code chunk below:
glimpse(odbus)Rows: 5,122,925
Columns: 7
$ YEAR_MONTH <chr> "2022-10", "2022-10", "2022-10", "2022-10", "2022-…
$ DAY_TYPE <chr> "WEEKDAY", "WEEKENDS/HOLIDAY", "WEEKENDS/HOLIDAY",…
$ TIME_PER_HOUR <dbl> 10, 10, 7, 11, 16, 16, 20, 7, 7, 11, 11, 8, 11, 11…
$ PT_TYPE <chr> "BUS", "BUS", "BUS", "BUS", "BUS", "BUS", "BUS", "…
$ ORIGIN_PT_CODE <dbl> 65239, 65239, 23519, 52509, 54349, 54349, 43371, 8…
$ DESTINATION_PT_CODE <dbl> 65159, 65159, 23311, 42041, 53241, 53241, 14139, 9…
$ TOTAL_TRIPS <dbl> 2, 1, 2, 1, 1, 4, 1, 3, 1, 5, 2, 5, 15, 40, 1, 1, …
The code chunk below is used to convert the numeric data type fields into character data type fields:
odbus$ORIGIN_PT_CODE <- as.factor(odbus$ORIGIN_PT_CODE)
odbus$DESTINATION_PT_CODE <- as.factor(odbus$DESTINATION_PT_CODE) 3.2 Extracting the study data
We will extract commuting flows on weekday and between 6 and 9 o’clock:
odbus6_9 <- odbus %>%
filter(DAY_TYPE == "WEEKDAY") %>%
filter(TIME_PER_HOUR >= 6 &
TIME_PER_HOUR <= 9) %>%
group_by(ORIGIN_PT_CODE,
DESTINATION_PT_CODE) %>%
summarise(TRIPS = sum(TOTAL_TRIPS))`summarise()` has grouped output by 'ORIGIN_PT_CODE'. You can override using
the `.groups` argument.
The table below shows the contents of odbus6_9:
datatable(odbus6_9)Warning in instance$preRenderHook(instance): It seems your data is too big for
client-side DataTables. You may consider server-side processing:
https://rstudio.github.io/DT/server.html
We will save the output in rds:
#|eval: false
write_rds(odbus6_9, "data/rds/odbus6_9.rds")To read in the saved rds file:
odbus6_9 <- read_rds("data/rds/odbus6_9.rds")4.0 Working with Geospatial data
Two geospatial data will be used in this exercise, and they are both in ESRI shapefile format.
4.1 Importing geospatial data
We will import the two geospatial data first, and save them in rds:
#|eval: false
busstop <- st_read(dsn = "data/geospatial",
layer = "BusStop") %>%
st_transform(crs = 3414)Reading layer `BusStop' from data source
`C:\byebhai8\ISSS626-GeospatialAnalytics\Hands-On_Ex\Hands-on_Ex10\Hands-on_Ex10A\data\geospatial'
using driver `ESRI Shapefile'
Simple feature collection with 5159 features and 3 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 3970.122 ymin: 26482.1 xmax: 48284.56 ymax: 52983.82
Projected CRS: SVY21
write_rds(busstop, "data/rds/busstop.rds")#|eval: false
mpsz <- st_read(dsn = "data/geospatial",
layer = "MPSZ-2019") %>%
st_transform(crs = 3414)Reading layer `MPSZ-2019' from data source
`C:\byebhai8\ISSS626-GeospatialAnalytics\Hands-On_Ex\Hands-on_Ex10\Hands-on_Ex10A\data\geospatial'
using driver `ESRI Shapefile'
Simple feature collection with 332 features and 6 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 103.6057 ymin: 1.158699 xmax: 104.0885 ymax: 1.470775
Geodetic CRS: WGS 84
write_rds(mpsz, "data/rds/mpsz.rds")To read in the saved rds files:
busstop <- read_rds("data/rds/busstop.rds")
mpsz <- read_rds("data/rds/mpsz.rds")5.0 Geospatial data wrangling
5.1 Combining busstop and mpsz
The code chunk below populates the planning subzone code of mpsz sf data frame into busstop sf data frame.
#|eval: false
busstop_mpsz <- st_intersection(busstop, mpsz) %>%
select(BUS_STOP_N, SUBZONE_C) %>%
st_drop_geometry()Warning: attribute variables are assumed to be spatially constant throughout
all geometries
write_rds(busstop_mpsz, "data/rds/busstop_mpsz.rds")To read in the saved rds files:
busstop_mpsz <- read_rds("data/rds/busstop_mpsz.rds")datatable(busstop_mpsz)Next, we will append the planning subzone code from busstop_mpsz data frame into odbus6_9 data frame.
od_data <- left_join(odbus6_9 , busstop_mpsz,
by = c("ORIGIN_PT_CODE" = "BUS_STOP_N")) %>%
rename(ORIGIN_BS = ORIGIN_PT_CODE,
ORIGIN_SZ = SUBZONE_C,
DESTIN_BS = DESTINATION_PT_CODE)Warning in left_join(odbus6_9, busstop_mpsz, by = c(ORIGIN_PT_CODE = "BUS_STOP_N")): Detected an unexpected many-to-many relationship between `x` and `y`.
ℹ Row 55491 of `x` matches multiple rows in `y`.
ℹ Row 161 of `y` matches multiple rows in `x`.
ℹ If a many-to-many relationship is expected, set `relationship =
"many-to-many"` to silence this warning.
It is good practice for us to check for duplicating records:
duplicate <- od_data %>%
group_by_all() %>%
filter(n()>1) %>%
ungroup()If duplicated records are found, the code chunk below will be used to retain the unique records:
od_data <- unique(od_data)It is good to check and confirm if the duplicating records issue has been addressed:
duplicate <- od_data %>%
group_by_all() %>%
filter(n()>1) %>%
ungroup()Next, we will update od_data with the planning subzone codes.
od_data <- left_join(od_data , busstop_mpsz,
by = c("DESTIN_BS" = "BUS_STOP_N")) Warning in left_join(od_data, busstop_mpsz, by = c(DESTIN_BS = "BUS_STOP_N")): Detected an unexpected many-to-many relationship between `x` and `y`.
ℹ Row 74 of `x` matches multiple rows in `y`.
ℹ Row 1379 of `y` matches multiple rows in `x`.
ℹ If a many-to-many relationship is expected, set `relationship =
"many-to-many"` to silence this warning.
duplicate <- od_data %>%
group_by_all() %>%
filter(n()>1) %>%
ungroup()od_data <- unique(od_data)od_data <- od_data %>%
rename(DESTIN_SZ = SUBZONE_C) %>%
drop_na() %>%
group_by(ORIGIN_SZ, DESTIN_SZ) %>%
summarise(MORNING_PEAK = sum(TRIPS))`summarise()` has grouped output by 'ORIGIN_SZ'. You can override using the
`.groups` argument.
#|eval: false
write_rds(od_data, "data/rds/od_data_fii.rds")To read in the saved rds file:
od_data_fii <- read_rds("data/rds/od_data_fii.rds")6.0 Visualising Spatial Interaction
We will prepare a desire line by using stplanr package.
6.1 Removing intra-zonal flows
The code chunk below is used to remove the intra-zonal flows:
#|eval: false
od_data_fij <- od_data[od_data$ORIGIN_SZ!=od_data$DESTIN_SZ,]
write_rds(od_data_fij, "data/rds/od_data_fij.rds")To read in the saved rds file:
od_data_fij <- read_rds("data/rds/od_data_fij.rds")6.2 Creating desire lines
In the code chunk below, od2line() of stplanr package is used to create the desire lines:
#|eval: false
flowLine <- od2line(flow = od_data_fij,
zones = mpsz,
zone_code = "SUBZONE_C")Creating centroids representing desire line start and end points.
write_rds(flowLine, "data/rds/flowLine.rds")To read in the saved rds file:
flowLine <- read_rds("data/rds/flowLine.rds")6.3 Visualizing the desire lines
The code chunk below is used to visualize the resulting desire lines:
tm_shape(mpsz) +
tm_polygons() +
flowLine %>%
tm_shape() +
tm_lines(lwd = "MORNING_PEAK",
style = "quantile",
scale = c(0.1, 1, 3, 5, 7, 10),
n = 6,
alpha = 0.3)Warning in g$scale * (x/maxW): longer object length is not a multiple of
shorter object length
Legend labels were too wide. Therefore, legend.text.size has been set to 0.66. Increase legend.width (argument of tm_layout) to make the legend wider and therefore the labels larger.

When the flow data is messy and highly skewed like the one above, it might be better to focus on selected flows. For example, we only want to focus on flows greater than or equal to 5000 as shown below:
tm_shape(mpsz) +
tm_polygons() +
flowLine %>%
filter(MORNING_PEAK >= 5000) %>%
tm_shape() +
tm_lines(lwd = "MORNING_PEAK",
style = "quantile",
scale = c(0.1, 1, 3, 5, 7, 10),
n = 6,
alpha = 0.3)Warning in g$scale * (x/maxW): longer object length is not a multiple of
shorter object length
Legend labels were too wide. Therefore, legend.text.size has been set to 0.66. Increase legend.width (argument of tm_layout) to make the legend wider and therefore the labels larger.

x